How do social scientists
answer questions using data?

PSCI 2270 - Week 4

Georgiy Syunyaev

Department of Political Science, Vanderbilt University

September 21, 2023

Plan for this week



  1. Question: What data can we collect to study factors that affect election participation?

  2. Applying CLT/LLN for estimation

  3. Logic of causal inference

Plan for this week


  1. Question: What data can we collect to study factors that affect election participation?

Plan for this week


  1. Question: What data can we collect to study factors that affect election participation?
  1. Applying CLT/LLN for estimation

Some Building Blocks


  • Probability:

    • Basis for understanding uncertainty in our estimates
    • Statistics is applied probability
  • Law of Large Numbers

    • Perform the same task over and over (draw an observation, draw a sample, etc.)
    • Average of the results converges to the truth
  • Central Limit Theorem:

    • Add up a lot of independent factors
    • Result follows the normal distribution

Large random samples


  • In real data, we will have a set of \(n\) measurements on a variable: \(X_1\) , \(X_2\), … , \(X_n\)

    • \(X_1\) is the age of the first randomly selected registered voter.
    • \(X_2\) is the age of the second randomly selected registered voter, etc.
  • Empirical analyses: sums or means of these \(n\) measurements

    • All statistical procedures involve a statistic, very often sum or mean.
    • What are the properties of these sums and means?
    • Can the sample mean of age tell us anything about the population distribution of age?
  • Asymptotics: what can we learn as \(n\) gets big?

Stats Lingo: LLN


Law of Large Numbers (LLN)

Let \(X_1\) , … , \(X_n\) be independent and identically distributed random variables with mean \(\mu\) and finite variance \(\sigma^2\). Then, \(\bar{X}\) converges to \(\mu\) as \(n\) gets large.


  • Intuition: The probability of \(\bar{X}\) being “far away” from \(\mu\) goes to \(0\) as \(n\) gets big

  • The distribution of sample mean “collapses” to population mean

Normal Distribution

  • The normal distribution is the classic “bell-shaped” curve.

    • Extremely ubiquitous in statistics
    • mean and variance follow standard notation
    • When \(X\) is distributed normally, we write \(X \sim N ( \mu, \sigma^2 )\)
  • Three key properties:

    • Unimodal: one peak at the mean
    • Symmetric around the mean
    • Everywhere positive: any real value can possibly occur

Stats Lingo: CLT


Central Limit Theorem (CLT)

Let \(X_1\) , … , \(X_n\) be independent and identically distributed random variables with mean \(\mu\) and variance \(\sigma^2\). Then, \(\bar{X}_n\) will be approximately distributed \(N ( \mu, \sigma^2 / n )\) in large samples.


  • Approximation is better as \(n\) goes up \(\Rightarrow\) asymptotics

  • “Sample means tend to be normally distributed as samples get large.”

    • We now know how far away \(\bar{X}_n\) will be from its mean!

Impications of CLT/LLN


  • By CLT, sample mean \(\approx\) normal with mean \(\mu\) and sd of \(\sigma^2 / n\)

  • By empirical rule, sample mean will be within \(2 \times \sigma^2 / n\) of the population mean 95% of the time

  • We usually only 1 sample, so we’ll only get 1 sample mean. So why do we care about LLN/CLT?

    • CLT gives us assurances our sample mean won’t be too far from population mean
    • CLT will also help us create measure of uncertainty for our estimates, standard error (SE):

    \[ SE = \sqrt{\frac{\sigma^2}{n}} = \frac{\sigma}{\sqrt{n}} \]

Putting the Concepts to Work

  • Question: What proportion of the public approves of Biden’s job as president?
  • Latest Gallup poll:

    • Aug. 1-23
    • 1014 adult Americans
    • Telephone interviews
    • Approve (42%), Disapprove (53%)
    • Devil in the Details
  • What can we learn about Biden’s approval in the population from this one sample?

Samples from Population



  • Our focus: simple random sample of size \(n\) from some population \(Y_1\) , … , \(Y_n\)

    • Each individual is independently drawn \(\Rightarrow\) \(i.i.d.\) random variables
    • \(Y_i = 1\) if i approves of Biden, \(Y_i = 0\) otherwise
  • Statistical inference is using data to guess something about the population distribution of \(Y_i\)

Point Estimation


  • Point estimation: providing a single “best guess” as to the value of some fixed, unknown quantity of interest, \(\theta\) (read theta)

    • \(\theta\) is a feature of the population distribution
    • Also called: parameters
  • Examples of quantities of interest ( estimands ):

    • \(\mu = \mathbb{E} [ Y_i ]\): the population mean (turnout rate in the population)
    • \(\sigma^2 = \mathrm{Var}[ Y_i ]\): the population variance
    • \(\mu_1 - \mu_0 = \mathbb{E} [ Y_i (1) ] - \mathbb{E} [ Y_i (0) ]\): the population Average Treatment Effect (ATE)
  • These are the things we want to learn about

Estimators


Estimator

An estimator, \(\hat{\theta}\), of some parameter \(\theta\), is some function of the sample: \(\hat{\theta} = h(Y_1 , ... , Y_n )\).

  • An estimate is one particular realization of the estimator

    • Ideally we’d like to know the estimation error (bias) \(\theta - \hat\theta\)
    • Problem: \(\theta\) is unknown
    • Solution: figure out the properties of \(\theta\) using probability
  • \(\hat{\theta}\) is a random variable because it is a function of sequence of random draws \(\Rightarrow\) \(\hat{\theta}\) has a distribution

Estimating Biden’s support


  • Parameter \(\theta\): population proportion of adults who support Biden
  • There are many (\(\infty\) ?) different possible estimators:

    • \(\hat{\theta} = Y_n\) : the sample proportion of respondents who support Biden
    • \(\hat{\theta} = Y_1\) : just use the first observation
    • \(\hat{\theta} = \max( Y_1 , ... , Y_n )\) : pick the maximum of all observations
    • \(\hat{\theta} = 0.5\) : always guess 50% support
  • How good are these different estimators?

Survey


  • Assume a simple random sample of n voters: \(n = 1014\)

  • Define random variable \(Y_i\) for Biden’s approval:

    • \(Y_i = 1 \rightarrow\) respondent \(i\) approves of Biden
    • \(Y_i = 0 \rightarrow\) respondent \(i\) disapproves of Biden
  • \(Y_i\) has probability of success \(p\)

    • “probability of success” = “probability of randomly selecting a Biden approver from population”
    • Remember that \(p\) is the expectation of \(Y_i\): \(p = P (Y_i = 1) = \mathbb{E} [ Y_i ]\)

Survey



  • Sample proportion is the same as the sample mean:

\[ \bar{Y} = \frac{1}{n} \sum_{i = 1}^{n} Y_i = \frac{\text{number who support Biden}}{\text{n}} \]

  • \(\theta\) \(= p\)

  • \(\hat\theta\) \(= \bar{Y}\)

Sample Mean Properties



\[ \underbrace{\text{sample mean}}_{\bar{Y}} = \underbrace{\text{population mean}}_{p} + \text{chance error} \]

  • Remember: the sample mean is a random variable

    • Different samples give different sample means
    • Chance error “bumps” sample mean away from population mean
    • \(\Rightarrow \bar{Y}\) has a distribution across repeated samples–sampling distribution

Central Tendency of the Sample Mean



  • Expectation: average of the estimates across repeated samples

    • From LLN: \(\mathbb{E}[\bar{Y}] = \mathbb{E}[ Y_i ] = p\)
    • \(\rightarrow\) chance error is \(0\) on average: \[\mathbb{E}[\bar{Y} − p] = \mathbb{E}[\bar{Y}] − p = 0\]
  • UnBIASedness: Sample proportion is on average equal to the population proportion

Spread of the Sample Mean


  • Standard error: how big is the chance error on average?
  • We can use a special rule to binary random variables to calculate SD:

\[\sqrt{\mathrm{Var}(\bar{Y})} = \sqrt{\frac{p(1 − p)}{n}}\]

  • Problem: we don’t know \(p\)!
  • Solution: estimate the SE

\[\sqrt{\widehat{\mathrm{Var}}(\bar{Y})} = \sqrt{\frac{\bar{Y}(1 − \bar{Y})}{n}} \class{fragment}{= \sqrt{\frac{0.42 (1 − 0.42)}{1014}} \approx 0.016}\]

Confidence Intervals



  • Awesome: Sample proportion is correct on average
  • Awesomer: Get an range of plausible values
  • Confidence interval: way to construct an interval that will contain the true value in some fixed proportion of repeated samples

Using CLT

\[ \bar{Y} − p = \text{chance error}\]

  • How can we figure out a range of plausible chance errors?

    • Find a range of plausible chance errors and add them to Y
  • Central Limit Theorem:

\[\bar{Y} \sim N \left( \underbrace{\mathbb{E}[Y_i]}_{p}, \underbrace{\frac{\mathrm{Var}(Y_i)}{n}}_{\frac{p(1-p)}{n}} \right)\]

  • Chance error: \(\bar{Y} − p\) is approximately normal with mean 0 and SD equal to \(\sqrt{\frac{p(1-p)}{n}}\)

Confidence interval



  • First, choose a confidence level.

    • What percent of chance errors do you want to count as “plausible”?
    • Convention is 95%.
  • \(100 \times (1 − \alpha)\) % confidence interval: \(CI = Y ± z_{\alpha/2} \times SE\)

    • In polling, \(\pm z_{\alpha/2} × SE\) is called the margin of error

CIs for the Gallup Poll


  • Gallup poll: \(\bar{Y} = 0.42\) with an SE of \(0.016\)
  • 90% CI: \[[0.42 − 1.64 × 0.016, 0.42 + 1.64 × 0.016] = [0.393, 0.446]\]
  • 95% CI: \[[0.42 − 1.96 × 0.016, 0.42 + 1.96 × 0.016] = [0.389, 0.451]\]
  • 99% CI: \[[0.42 − 2.58 × 0.016, 0.42 + 2.58 × 0.016] = [0.379, 0.461]\]
  • Less confidence \(\rightarrow\) Wider intervals

95% CI’s


95% CI’s


95% CI’s


95% CI’s


95% CI’s


95% CI’s


Plan for this week


  1. Question: What data can we collect to study factors that affect election participation?

  2. Applying CLT/LLN for estimation

  1. Logic of causal inference

External vs Internal Validity



  • So far we focused on ability to study population using just a sample \(\Rightarrow\) external validity
  • This is important if all relevant outcomes within sample are observed

    • e.g. sample mean, median, polarization, etc.
  • But, for causal (“what-if”) question we cannot observe all relevant outcomes within sample \(\Rightarrow\) internal validity

    • Why?

Fundamental Problem of Causal Inference


  • Factual vs. Counterfactual
  • Does the minimum wage increase the unemployment rate?

    • Unemployment rate went up after the minimum wage increased
    • Would it have gone up if the minimum wage increase not occurred?
  • Does having a daughter affect a judge’s rulings in court?

    • A judge with a daughter gave a pro-choice ruling.
    • Would they have done that if had a son instead?
  • Fundamental problem of causal inference

    • Can never observe counterfactuals \(\Leftarrow\) must be inferred

Fundamental Problem in Movies


Hypothetical Example



  • Question: Does having a female as a head of a village council increase share of budget allocated to water sanitation?

  • Setting: 8 randomly sampled villages in Indonesia (some with female and some with male head)

  • Outcome: Share of budget each village spends on water sanitation

Compare Two Villages



Village Head of Council Budget Share
Village 1 Female 15%
Village 2 Male 10%



  • Did the first village have larger share spent on water sanitation because the head of the council was female?

Experimental Lingo


  • Treatment/intervention, \(T_{i}\): Who is head of council in village \(i\) (was independent variable)
  • Treatment (\(T_i = 1\)) group: Villages with female head of council

  • Control (\(T_i = 0\)) group: Villages with male head of council

  • Outcome variable, \(Y_i\): Share of spending


Village \(T_i\) (Head of Council) \(Y_i\) (Budget Share)
Village 1 1 15
Village 2 0 10

Potential Outcomes


  • What does “\(T_i\) causes \(Y_i\)” mean?

    • Would a village with female and male head have different budget allocations?
  • Imagine two states of the world: one in which you receive some treatment and another in which you do not \(\Rightarrow\) potential outcomes

    • Treated, \(Y_i (1)\): spending on water sanitation if village \(i\) had a female head?
    • Untreated/Control, \(Y_i (0)\): spending on water sanitation if village \(i\) had a male head?

Treatment Effect(s)


  • (Individual) Treatment effect: \(Y_i (1) − Y_i (0)\)

    • \(Y_i (1) − Y_i (0) = 0\): gender of village head has on spending on water sanitation
    • \(Y_i (1) − Y_i (0) < 0\): female village head has positive effect on spending on water sanitation
    • \(Y_i (1) − Y_i (0) > 0\): female village head has positive effect on spending on water sanitation
  • Average Treatment Effect (ATE):

    \[ \frac{1}{n} \sum_{i = 1}^{n} Y_i (1) − \frac{1}{n} \sum_{i = 1}^{n} Y_i (0) = \frac{1}{n} \sum_{i = 1}^{n} \left[ Y_i (1) − Y_i (0) \right] \]

    • Difference between average treated and untreated potential outcomes
    • Does having female head of village lead to increase in spendings on water sanitation on average?
    • Note: You might come close to observing this treatment effect if you can observe the \(Y_i (0)\) the instant before an intervention and the \(Y_(1)\) the instant after, but strictly speaking, you are not observing them at the same time \(\Rightarrow\) Focus on ATE

Back to Fundamental Problem


Village \(T_i\) (Head of Council) \(Y_i\) (Budget Share) \(Y_i (0)\) (Budget Share if Male Head) \(Y_i (1)\) (Budget Share if Female Head)
Village 1 1 15 ??? 10 16 14 15
Village 2 0 10 10 ??? 12 7 9


  • Fundamental problem of causal inference:

    • We only observe one of the two potential outcomes.
  • Observe \(Y_i = Y_i (1)\) if \(T_i = 1\) or \(Y_i = Y_i (0)\) if \(T_i = 0\)

  • To infer causal effect, we need to infer the missing counterfactuals!

Matching?



  • Find a similar unit! \(\Rightarrow\) matching

    • Mill’s method of difference
  • Did village spend more on water sanitation because of female council head?

    • \(\rightarrow\) find a village that has male council head but very similar otherwise
  • NJ increased the minimum wage. Causal effect on unemployment?

    • \(\rightarrow\) find a state similar to NJ that didn’t increase minimum wage

Imperfect matches

  • The problem: imperfect matches!

  • Say we match villages \(i\) (treated) and \(j\) (control)

  • Selection Bias: \(Y_i (1) \neq Y_j (1)\) or \(Y_i (0) \neq Y_j (0)\)

  • Those who take treatment may be different that those who take control.

  • How can we correct for that?

RANDOMIZE! 😵‍💫

(Social) Science Squad

Why Does it Work?



  • Fundamental problem of causal inference still prevents us from estimating individual treatment effects, BUT…
  • Random assignment enables us to create two groups whose treated and untreated potential outcomes are the same in expectation
  • The treatment group provides us with a random sample of \(Y_i (1)\), and the control group provides us with a random sample of \(Y_i (0)\)
  • The difference-in-means estimator compares average outcomes between two samples: treatment and control group

\[ \text{Difference-in-means} = \bar{Y}_{\text{treated}} - \bar{Y}_{\text{untreated}} \]

Core Assumptions

  1. Random assignment of subjects to groups: Implies that receiving the treatment is statistically independent of subjects’ potential outcomes
  1. Non-interference: A subject’s potential outcomes reflect only whether they receive the treatment themselves

    • A subject’s potential outcomes are unaffected by how the treatments happened to be allocated
  1. Excludability: A subject’s potential outcomes respond only to the defined treatment, not other extraneous factors that may be correlated with treatment

    • Importance of defining the treatment precisely and maintaining symmetry between treatment and control groups (e.g., through blinding)

Difference-in-means is Unbiased

Under assumptions 1-3 Difference-in-Means estimator produces unbiased estimates of the Average Treatment Tffect (ATE). In other words in expectation Difference-in-Means over many trials (within the same sample) we will get the Average Treatment Effect.

Observational Methods



  • All observational methods in statistics are trying to approximate randomization under assumptions!
  • Regression with covariates: Block possible known (?) confounders

  • Matching: Use known (?) covariates to match units and then compare within matched sets

  • Event study: Use before and after treatment comparison within the same unit

  • Regression Discontinuity: Use some naturally occurring discontinuity (e.g. tax exemption threshold, borders, etc.) to compare units around it

  • Differences-in-Differences: Compare trends between treated and untreated units even if we know there might be differences between them

  • Many other!

Back to Example


Village \(T_i\) (Head of Council) \(Y_i\) (Budget Share) \(Y_i (0)\) (Budget Share if Male Head) \(Y_i (1)\) (Budget Share if Female Head)
Village 1 1 15 ??? 15
Village 2 0 10 10 ???
Village 3 0 20 20 ???
Village 4 1 15 ??? 15
Village 5 0 10 10 ???
Village 6 1 15 ??? 15
Village 7 1 30 ??? 30
Village 8 0 10 10 ???
  • What do we substitute for ???? We need an educated guess …

Two approaches

  • Estimate uncertainty of your difference-in-means estimate

    • Re-create whole schedule of potential outcomes, draw random samples, see what differences-in-means you could get
    • Can also be done using parametric approach using regression
  • Hypothesis testing: There could be substantively interesting other treatment effects that we want to compare to what we observe

    • Each individual treatment effect is \(0\)
    • Average treatment effect is \(0\)
    • etc.
  • We call these values of quantity of interest hypotheses and can see how likely are we to observe our result under those hypotheses to test them

    • Reconstruct schedule of potential outcomes assuming hypothesis is true
    • See how likely is your observed value \(\Rightarrow\) \(p\)-value

References